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' The Ising channel, which was introduced in 1990, is a channel with memory that models Inter- 

i Symbol interference. In this paper we consider the Ising channel with feedback and find the capacity 

of the channel together with a capacity-achieving coding scheme. To calculate the channel capacity, an 
] equivalent dynamic programming (DP) problem is formulated and solved. Using the DP solution, we 

establish that the feedback capacity is the expression C = (^^^^^^ ~ 0.575522 where a is a paiticular 
root of a fourth-degree polynomial and Hh{x) denotes the binary entropy function. Simultaneously, 
Q . a — argmaxo<2:<i [j^^^^ ■ Finally, a simple, error-free, capacity-achieving coding scheme is provided 

together with outlining a strong connection between the DP results and the coding scheme. 

^ ' Index Terms 

\^ Bellman Equation, Dynamic program. Feedback capacity, Ising channel, Infinite-hori7on, value 

. • . iteration. 
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; I. Introduction 

The Ising model originated as a problem in statistical mechanics. It was invented by Lenz in 1920 
^ ' ID, who gave it as a problem to his student, Ernst Ising, after whom it is named 12]. A few years later 



the two dimensional Ising model was analytically defined by Onsager [3J. The Ising channel, on the 
other hand, was introduced as an information theory problem by Berger and Bonomi in 1990 [4J. It has 
received this name due to the resemblance to the physical Ising model. 

The Ising channel works as follows: at time t a certain bit, xt, is transmitted through the channel. The 
channel output at time t is denoted by yt. If xt = xt-i then yt = xt with probability 1. If / xt-i 
then yt is distributed Bernoulli (^). 
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In their work on the Ising channel, Berger and Bonomi found the zero-error capacity and a numerical 
approximation of the capacity of the Ising channel without feedback. In order to find the numerical 
approximation, the Blahut-Arimoto Algorithm was used |j5], lH. The capacity was found to be bounded 
by 0.5031 < C < 0.6723 and the zero-error capacity was found to be 0.5 bit per channel use. Moreover, 
their work contains a simple coding scheme that achieves the zero-error capacity. This code is the basis 
for the capacity-achieving coding scheme in the presence of feedback presented in this paper. 

We consider the Ising channel with feedback, which models a channel with Inter-Symbol Interference 
(ISI). The objective is to find the channel feedback capacity explicitly, and to provide a simple, capacity- 
achieving coding scheme. Finding an explicit expression for the capacity of non-trivial channels with 
memory, with or without feedback, is usually a very hard problem. There are only a few cases in the 
literature that have been solved, such as additive Gaussian channels with memory without feedback 
("water filling solution", Q, lEl), additive Gaussian channels with feedback where the noise is ARMA 
of order 1 [9], channels with memory where the state is known both to the encoder and the decoder 
Uni, ifTTl . and the trapdoor channel with feedback [12|. This paper adds one additional case, the Ising 
channel. 

Towards this goal, we start from the characterization of the feedback-capacity as the normalized directed 
information ^/(X" y"). The directed information was introduced two decades ago by Massey |[T3l 
(who attributed to Marko lHH) as 



Massey |[T3ll showed that the normalized maximum directed information upper bounds the capacity of 
channels with feedback. Subsequently, it was shown that directed information, as defined by Massey, 
indeed characterizes the capacity of channels with feedback |[T5l - ll2ll . 

The capacity of the Ising channel with feedback was approximated numerically ll22l using an extension 
of Blahut-Arimoto algorithm for directed information. Here, we present the explicit expression together 
with a simple capacity-achieving coding scheme. The main difficulty of calculating the feedback capacity 
explicitly is that it is given by an optimization of an infinite-letter expression. In order to overcome this 
difficulty we transform the normalized directed information optimization problem into an infinite average- 
reward dynamic programming problem. The idea of using dynamic programming (DP) for computing the 
directed information capacity has been introduced and applied in several recent papers such as lITTI . llT2l . 
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lITTl . |[23l . The DP used here most resembles the trapdoor channel model |[T2l . We use a DP method that 
is specified for the Ising channel rather then the trapdoor channel, and provide an analytical solution for 
the new specific DP 

It turns out that the DP not only helps in computing the feedback capacity but also provides an 
important information regarding a coding scheme that achieves the capacity. Through the DP formulation 
and through its solution we are able to derive a simple and concrete coding scheme that achieves the 
feedback capacity. The states and the actions of the dynamic programming turn out to include exact 
instructions of what the encoder and the decoder do to achieve the feedback-capacity. 

The remainder of the paper is organized as follows: in Section |II] we present some notations which are 
used throughout the paper, basic definitions, and the channel model. In Section |lll] we present the main 
results. In Section JV] we present the outUne of the method used to calculate the channel capacity. In this 
section we explain shortly about DP and about the Bellman Equation, which is used in order to find the 
capacity. In Section |V] we compute the feedback capacity using a value iteration algorithm. In Section 
IVll an analytical solution to the Bellman Equation is found. Section IVIII contains the connection between 
the DP results and the coding scheme. From this connection we can derive the coding scheme explicitly. 
In section IVIIII we prove that the suggested coding scheme indeed achieves the capacity. Section |IX] 
contains conclusions and discussion of the results. 

II. Notations, definitions and channel model 

A. Notations 

• Calligraphic letters, X, denote alphabet sets, upper-case letters, X, denote random variables, and 
lower-case letters, x, denote sample values. 

• Superscript, x^, denotes the vector {xi, . . . ,xt). 

• The probability distribution of a random variable, X, is denoted by px- We omit the subscript of 
the random variable when the arguments have the same letter as the random variable, e.g. p{x\y) = 
Px\Y{x\y)- 

The notations related to the channel are presented in Table D 
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TABLE I 

Frequently used notations. 



Notation 


Meaning 


t 


Time (g N) 


Xt 


Chiannel Input at time t (g X) 


St 


Channel State at time t (G iS) 


yt 


Channel Output at time t {£ y) 



B. Definitions 

Here we present some basic definitions beginning with a definition of a finite state channel (FSC). 

Definition 1 (FSC). / |2?] ch. 4 ] An FSC is a channel that has a finite number of possible states and has 

the property: p{yt, st\x\ s*^\ y*"^) = p{yt, st\xt, st-i). 

Definition 2 (Unifilar FSC). / [2?] ch. 4] An FSC is called a unifilar FSC if there exists a time-invariant 
function /(•) such that st = f{st-i,xt,yt)- 

Definition 3 (Connected FSC). /ED ch. 4] An FSC is called a connected FSC if 

Vs,s' G 5 3 G N and {pixt\st-i)}Jli such that ^"^^PStlSoi^l^') > ^■ 

t=i 

In other words, for any given states s, s' there exists an integer Tg and an input distribution 
{p{xt\st-i)\^li, such that the probability of the channel to reach the state s from the state s' is positive. 

C. Channel model 

In this part, the Ising channel model is introduced. The channel is a unifilar FSC with feedback, as 
depicted in Fig. [T] As mentioned before, the sets X,y,S denote the input, output, and state alphabet, 
respectively. In the Ising channel model: X = y = S = {0,1}. 

The Ising channel consists of two different topologies, as described in Fig. |2] The channel topologies 
depend on the channel state and are denoted by Z and S. These Z and S notations are compatible with 
the well known Z and S channels. The channel topology at time t is determined by st-i G {0, 1}. As 
shown in Fig. |2l if st-i = 1, the channel is in the Z topology; if st-i = 0, the channel is in the S 
topology. The channel state at time t is defined as the input to the channel at time t, meaning st = xt. 
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Fig. 1. Unifilar finite state channel with feedback of unit delay. 



St-1 = 1 



St-1 = 



yt 





yt 



Fig. 2. The Ising channel model. On the left we have the Z topology; and on the right we have the S topology. 



The channel input, xt, and state, st-i, have a crucial effect on the output, yt. If the input is identical 
to the previous state, i.e. xt = st-i, then the output is equal to the input, yt = xt, with probability 1; if 
Xt 7^ st-i then yt can be either or 1, each with probability 0.5. This effect is summarized in Table JI] 



We assume a communication settings that includes feedback. The feedback is with unity delay. Hence, 
the transmitter (encoder) knows at time t the message m and the feedback samples y*^^. Therefore, the 
input of the channel, xt, is a function of both the message and the feedback as shown in Fig. [T] 

Lemma 1. The Ising channel is a connected unifilar FSC. 

Proof: Lemma [T] is proved in three steps. At each step we show a different property of the channel, 
(a) The channel is an FSC channel since it has two states, and 1. Moreover, the output probability 
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TABLE II 

The channel states, topologies, and inputs, together with the probability that the output at TIMEt, yt, IS 

EQUAL to the INPUT AT TIME t, Xt. 



St-l{= Xt-l) 


Topology 


Xt 


p{yt = xt\xt, st-i) 





S 





1 





S 


1 


0.5 


1 


z 





0.5 


1 


z 


1 


1 



is a function of the input and the previous state. Hence, it is clear that p{yt,st\x^,s^ ^,y^ ^) = 
p{yt,st\xt,st-i) since st = xt and yt depends only on xt and st-i. To be more accurate, we can 
write p{yt,st\x\s^~'^,y^~^) = p{yt, st\xt, st-i) = p{yt, st\xt,xt-i). 

(b) The channel is a uiufilar FSC since (a) it is an FSC and (b) st = xt- Obviously, st = f{st-i, xt, yt) = 
Xt is a time-invariant function. 

(c) The channel is a connected FSC due to the fact that st = xt- Thus, one can take = 1 and 
PXt\S'{.sW) = 1' resulting in Pr(5i = sIiSq = s') = 1 > 0. 

■ 

III. Main Results 

Theorem 1. (a) The capacity of the Ising channel with feedback is Cf = (^^^^^ ~ 0.5755 where 

a ~ 0.4503 is a specific root of the fourth-degree polynomial — bx^ + Qx^ — 4x + 1. 
(b) The capacity, Cf, is also equal to maxo<z<i ^ ^3+^^ ^ where a = argmaxo<a;<i ^ ^3+^ j ~ 0.4503. 

Theorem 2. There is a simple capacity-achieving coding scheme, which follows these rules: 

(i) Assume the message is a stream of n bits distributed Ltd. with probability 0.5. 

(ii) Transform the message into a stream of bits where the probability of alternation between to 1 
and vice versa is (1 — a), where a ^ 0.4503. 

(Hi) Since the encoder may send some bits twice, the channel output at time t does not necessarily 
corresponds to the tth message bit, therefore, we are working with two time frames. The message 
time frame is denoted in t while the encoder's and the decoder's time frame is denoted in t'. In 
other words, we denote the tth message bit in mt where it corresponds to the t'th encoder's entry. 
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Following these rules: 

(1) Encoder: At time t', the encoder knows Sf-i = Xf-i, and we send the bit mt {xf = rrit): 

(1.1) If yr 7^ Sf-i then move to the next bit, m^+i. This means that we send nit once. 

(1.2) If yr = Sf'-i then Xf = Xf+i = mt, which means that the encoder sends mt twice (at 



(2) Decoder: At time t', assume the state Sf-i is known at the decoder, and we are to decode the 
bit rht: 

(2.1) If yr ^ sr-i then rht = yv and Sf = yt'. 

(2.2) If yt' = Sf-i then wait for yt'+i. mt = yt'+i and Sf = yt'+i. 

IV. Method Outline, Dynamic Program, and the Bellman Equation 
In order to formulate an equivalent dynamic program we use the following theorem: 

Theorem 3. [12] The feedback capacity of a connected unifilar FSC, when initial state sq is known at 
the encoder and the decoder can be expressed as 



where {p{xt\st-i,y^ ^)}t>i denotes the set of all distributions such that p[xt\y^ ^,s* ^) = 

p{xt\st-i,y*^'^) for t = 1,2,... . 

Using Theorem |3] we can formulate the feedback capacity problem as an infinite-horizon average- 
reward dynamic program. Then, using the Bellman Equation we find the optimal average reward which 
gives us the channel capacity. 

A. Dynamic Programs 

Here we introduce a formulation for average-reward dynamic programs. Each DP problem is defined 
by a septuple {Z,U,W, F, Pz, Pw, g), where all the functions considered are assumed to be measurable: 
^ : is a Borel space which contains the states. 

F : is the function for which the discrete-time dynamic system evolves according to. zt = 

F{zt-i,ut,wt),t G N-"*^ (where each state zt G Z). 
Z// : is a compact subset of a Borel space, which contains the actions ut. 



time t' and t' + 1) and then move to the next bit. 



sup 

{p(xt\st-i,y*-'^)}t>i 




(2) 



t=i 
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W : is a measurable space, which contains the disturbances wt- 
Pz : is a distribution from which the initial state, zq, is drawn. 

Pw '■ is the distribution for which each disturbance, wt, is drawn in accordance with Pyj{-\zt-i,ut), 
which depends only on the state zt-i and action ut. 
g : is, & bounded reward function. 

We consider a discrete-time dynamic system evolving according to 



The history, ht = {zq,wq, . . . ,wt-i), summarizes information available prior to the selection of the tth 
action. The action, ut, is selected by a function, /i^, which maps histories to actions. In particular, given 
a policy, vr = {fii,fi2, ■ ■ .}, actions are generated according to ut = fJ.t{ht)- 

The objective is to maximize the average reward, given a bounded reward function g : Z xU ^ 
The average reward for a policy vr is defined by 



where the subscript vr indicates that actions are generated by the policy vr = {fii,^2, • • •)• The optimal 
average reward is defined by p* = sup^ p,r- 

B. The Bellman Equation 

An alternative characterization of the optimal average reward is offered by the Bellman Equation. 
This equation verifies that a given average reward is optimal. The result presented here encapsulates the 
Bellman Equation and can be found in |25]. 

Theorem 4. [25'] If p £M and a bounded function h : Z t—T'M satisfy 



then p = p*. Furthermore, if there is a function p : Z ^ U. such that p{z) attains the supremum for 
each z and satisfies (0), then p-,^ = p* for vr = (pQ, pi, . . .) with pt{ht) = p{zt-\) for each t. 

It is convenient to define a DP operator T by 



zt = F{zt-i,ut,wt), t = 1,2,3, 



(3) 




(4) 




(5) 




(6) 
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for all functions h. Thus, the Bellman Equation can be written as pi + /i = Th. We also denote in T^h 
the DP operator restricted to the policy, ^. 

V. Computing the feedback capacity 

A. Dynamic Programming Formulation 

In this section we associate the DP problem, which was discussed in the previous section, with the 
Ising channel. Using the notations previously defined, the state, zt, would be the vector of channel 
state probabilities [p{st = 0|y*),p(sj = In order to simplify notations, we consider the state zt 

to be the first component; that is, zt := p{st = 0|y*). This comes with no loss of generality, since 
p{st-i = 0|y*~^) + p{st-i = Ijy*^^) = 1. Hence, the second component can be derived from the first, 
since the pair sums to one. The action, n^, is a 2 x 2 stochastic matrix 

p{xt = 0\st-i = 0) p{xt = l\st-i = 0) 
p{xt = 0\st-i = 1) pixt = l\st-i = 1) 
The disturbance, wt, is the channel output, yt. The DP-Ising channel association is presented in Table 

Im. 

table III 

The Ising channel model notations Vs. Dynamic Programming notations. 



ut 



(V) 



Ising channel notations 


Dynamic Programming notations 


p{st — 0|j/'), probability of the channel 
state to be given the output 


Zt, the DP state 


yt, the channel output 


Wt, the DP disturbance 


p{xt\st-i), channel input probability 
given the channel state at time t — 1 


Ut, the DP action 


Eq. {Hi 


Zt = F{zt~i,ut^i,wt-i), the DP state 
evolves according to a function F 


I{Xt,St-i;Yt\y*-^) 


g(zt-i,ut), the DP reward function 



Note that given a policy vr = (^i,/U2, • • •), p{st\y^) is given in ([8]l (as shown in |[T2l), where l(-) is 
the indicator function. The distribution of the disturbance, wt, is p{wt\z*^^ ,w'^^^ ,u*) = p{wt\zt-i,ut). 
Conditional independence from z*"^ and tt;*~^ given zt^i is due to the fact that the channel output is 
determined by the channel state and input. 
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The state evolves according to zt = F{zt-i,ut,'Wt), where we obtain the function F explicitly using 
relations from (|8j as follows: 



Zt 



2t-int(l.l)+0.5(l-Zt-l)Mt(2,l) -n _ ^ 

2t_i«t(l,l)+0.5zt_i«t(l,2)+0.5{l-^t_i)ut{2,l) ' ^ 

0.5(l-2t-iK(2,l) ^.^ ^ ^ 



(9) 



(10) 



0.5zt_iUt(l,2)+0.5(l-2t_i)nt(2,l)+0.5(l-^t_i)Mt(2,2) 

These expressions can be simplified by defining 

7t := (l-zt_iK(2,2) 
(5i := zt-iut{l, 1), 

and, using the fact that ut{l, 1) = 1 — ut{l,2),ut{2,2) = 1 — ut{2, 1), we have that 

f 1 + , if = 

zt=<^ (11) 

I 1+7* -A", ' 1^ - ^■ 

Note that jt^^t are functions of zj-i. As shown in (fTOl) . given Zf-i, the action defines the pair 
(7f,5f) and vice versa. From here on, we represent the actions in terms of 7^ and 6t. Since ut is a 
stochastic matrix, we have the constraints < St < zt and < 7t < 1 — ^t- 

After finding the channel state probability, we can formulate the DP operator We consider the 
reward to be g{zt-i,ut) = I{Xt, St-i;Yt\y^-'^). Note that if g{zt-i,ut) = I{Xt, St-i;Yt\y^-^) then 
using Theorem |3] we have that 

p* = snvlimmi J y2l{Xt, St-i-,Yt\y'-^))\ =Cfb- (12) 

First, we show that the reward function I{Xt, St-i;Yt\y^^^) is indeed a function of ut and zt-i- To show 
this we note that 

p{xt,st-i,yt\y*~'^) = p{st-i\y^~'^)p{xt\st-i,y*~'^)p{yt\xt,st-i). (13) 

Recall that p{yt\xt, st-i) is given by the channel model. Thus, the reward is dependent only on 

p(sf_i|y*^^) and p{xt\st-i,y^^^) = Uf. Since p{st-i\y^^^) is given by zt-i, we have that 

g{zt-i,ut) = I{Xt,St-i;Yt\y'-^) (14) 



pist\y^ 



^)ut{st-i,xt)p{yt\st-i,xt)l{st = f{st^i,xt,yt)) 
Y.xt,st,st-i p{st-i\y*~^)ut{st-i,xt)p{yt\st-i,xt)l{st = f{st-i,xt, yt)) ' 



(8) 
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is indeed a function of ut and zt-i. Now we find g{zt-i, ut) explicidy for the Ising channel: 



I{XuSt-i;Yt\y'-') 



Hb(Yt\y 



HJXt,St-i\Yt,y 



i'^) TT ( n i\ , ^t-i^t(l>2) zt-iut{2,l] 
Hh{ zt-iut{l,l) ^ h 



(6) 



2 2 

izt-iutil,2)-l + il-zt-i)uti2,l)-l) 



(15) 
(16) 

(17) 



Where Hi,{-) denotes the binary entropy function, (a) follows from Table JV] where the conditional 
distribution p{xt, st-i,yt\y'^^^) is calculated using (fT3] ) and (6) follows from the definition of 6 and 7 
given in (fTOl) and the fact that ut is a stochastic matrix. Therefore, using (ITST l and (l6]l, we write the DP 
operator explicitly: 

(18) 
(19) 



{Th){z) = sup [ g{z,u) + / P«,(dt(;|2:, u)/i(F(2:, u, w)) 

= sup Q + ^^Y^^ + (5t + 7t - 1 + y Pw{dw\z, u)h{F{z, u, w)) 



(a) 



sup Hb [ - + 

0<5<2,0<7<l-2 



-n. 



2 2 

z — 7' 



1 + 5-7, A (5-2 
+ (5 + 7-1 + -hi 1 + 



(5 + 1-7 



(20) 



2 "Vl + T-*^. 

where (a) follows from the fact that in the Ising channel case, f Pw{d'w\z,u)h{F{z,u,w)) takes 
the form "^^^q iP{w\z,u)h{F{z,u,w)) and F{z,u,w) is calculated explicitly using (fTTI ). We have 
formulated an equivalent dynamic program problem and found the DP operator explicitly. The objective 
is to maximize the average reward over all policies vr. According to Theorem |4l if we identify a scalar 
p and bounded function h that satisfies the Bellman Equation, p + Th{z) = h{z), then p is the optimal 
average reward and, therefore, the channel capacity. 

TABLE IV 

The conditional distribution p(xt, st-i,yt\p{st-i\y*~^) 



Xt 


St-l 


yt = o 


yt = i 








p(st_i = 0|j/t_i)ui(l, 1) 








1 


0.5p(st-i = l|t/t-i)ut(2,l) 


0.5p(st_i = l\yt-i)ut(2,l) 


1 





0.5p(st_i = 0|j/t-i)Mt(l,2) 


0.5p(st_i = 0|2/i-i)ut(l,2) 


1 


1 





= l|yt_i)ut(2,2) 
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B. Numerical Evaluation 

The aim of the numerical solution is to obtain some basic knowledge of the bounded function, h, 
which satisfies the Bellman Equation. In order to do so, the Bellman equation is solved using a value 
iteration algorithm. The algorithm generates a sequence of iterations according to 

jfc+i = r(jfc), (21) 

where T is the DP operator, as in ([T8] ). and Jq = 0. 

For each k and z, Jk{z) is the maximal expected reward over k periods given that the system starts in 
state z. Since rewards are positive, Jk(z) grows with k for each z. For each k, we define a differential 
reward function, hj.{z) = Jk{z) — Jfc(O). 

For the numerical analysis, the interval [0, 1] was represented with a 1000 points grid. Furthermore, 
each interval, such as 6, which is in [0,2;] and 7, which is in [0, (1 — z)], was also represented with a 
1000 points grid. Obviously, the result has limited accuracy due to machine error in representation of 
real numbers. The numerical solution after 20 value iterations is shown in Fig. |3] This figure shows the 
J2o{z) function and the corresponding poUcies, 7*(z) and S*{z). The policies are chosen numerically 
such that the equation TY,5*h{z) > T^^sh{z) holds for all 7,5 on the grid, where ^ represents the 
DP operator restricted to the policy given by 7, 6. Moreover, Fig. |3] shows the histogram of z, which 
represents the relative number of times each point has been occupied by a state z. These values of z have 
been calculated using ^ over 250000 iterations, where each iteration calculates the next point using 
Each time a specific value of z was visited the program adds to this value 255500' which gives the 
relative frequency each point was visited. 

VI. Analytical Solution 

In order to ease the search after the analytical solution, some assumptions based on the numerical 
solution have been made. Later on, these assumptions are proved to be correct. 

A. Numerical results analysis 

Here, the numerical results are explored in order to obtain further information on the possible analytical 
solution. We denote the optimal actions with ^*{z),5* {z). Moreover, h*{z) denotes the function h{z) 
restricted to the policy 7*, 5*. 
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Value function on the 20*'' iteration, J20 Histogram of Z 




Z Z 



Action parameter, S Action parameter, 7 




Fig. 3. Results from 20 value iterations. On the top-left, the value function J20 is illustrated. On the top-right, the approximate 
relative state frequencies are shown. At the bottom, the optimal action policies, 5* and 7*, are presented as obtained after the 
20"' value iteration. 



As we can clearly see from the histogram presented in Fig. |3] the states, z, alternate between four 
major points. These points, which are symmetric around ^ and two of them are and 1, are denoted by 
zo-, zi,Z2, zs, where zq = and 2:3 = 1. In addition, the function h*{z) found numerically is assumed to 
be symmetric around |. 

In order to find Zi, i = 0,1,2,3, we define 7q = 7*(z = 0) = a. The variables Zi,j* = 'y*{zi), 
and S* = 6*{zi) for i = 0,1,2,3, are presented in Table IVl These variables are written with respect 
to the unknown parameter, a = = 'y*{z = 0), using (ITTT) . and the following assumptions, which are 
based on Fig. |3l i) we assume that 5 = z Vz e [zq, Z2], and ii) we assume there is a symmetry relation, 

j*{z) = 6*{l-z). 

Using the numerical solution, we assume that a ^ {0, 1}, which implies that zi, Z2 ^ {0, 1}. In addition, 
we assume that zi < Z2 and hence a > ^. There is no loss of generality here, since otherwise we could 
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TABLE V 



The variables Zi, 7*, 5* 



20 = 


7o = a 


55 = 


^ l + a 


71* = — 

'1 l + a 


l + a 


^2 — l + a 


* 1 — a 


^2 = -n^ 

^ l + a 


23 = 1 


73* =0 





do the same analysis and switch between z\ and z^- Moreover, using (ITTT) and basic algebra, one can 
notice that the selection of 7*, 5* i = 0, 1,2,3, as in TablejVl guarantees that the states, zt, alternate 
between the points Zi. For example, in order to attain the Zi points, ([TT]) was used in the following way: 
if zt-\ = then we have that ^y^ = a, 6q = which implies 

- if = then = 1 + ^13^ = 1 + ^ = 1. 



if u;t = 1 then zt - - - ^ 



■-i-7t _ _ 

l+7t*-<5t* l+a l+a' 

Now, if zt_i = using the symmetry, we also have the point zt = I — = j^- Since a > | we 
have < Thus, for any specific time t, zt G {zq = 0, zi = Z2 = jf^, Z'i = 1}. 

In addition, based on the numerical solution, we assume that 7* and 5* can be approximated by straight 
lines. Therefore, the following expressions can be found: 



7*(z) 



a + az, if 2: G [zq, zi\ 

(22) 

1 - z, if z G [zi,z^] 



5*{z) 



z, if 2; G [20,^2] 

a(2 - z), if z G [22,23]- 



(23) 



Note that if a scalar, p, and a function, h, solve the Bellman Equation, so do p and h + cl for any 
scalar cl. Hence, with no loss of generality, we can assume = 1. 

Lemma 2. Let 7* anJ 5* be as in l[22\l,S23]), then 



h*{z) = Hb{z) V2G[Z1,22]. 



(24) 



ant/ /i*(0) = h*{l) = p* satisfies Ts'^Y^*i^) = h* + p* Vz G [^1,22 
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Proof: Using the definition of {Ts* ^■y*h*){z),5* , and 7* we obtain 

h*{z)+p* = {Ts^^^,h*){z) 

I 1 + Z-{1-Z) ^, / ^ \ ^ l-Z+jl-z) ^^ 



2 V z + l-{l-z)J 2 \l + {l-z)-z 

= Hbiz)+zh*{l) + {l-z)h*{0) (25) 

for all z G [21,^2]- Using the symmetry /i*(0) = h*{l), we conclude that (T5._^./i*)(z) = -?/fc(2:) + 
/i*(0) Vz € [zi, Z2], which implies h*{0) = h*{l) = p* and h*{z) = Hb{z). ■ 

Lemma 3. Let 7* anJ 5* be as in l[22\l.l[23]l. then 

1 \ (2a + {l — a)z\ az — Aa — z 



2(1 - a) 



-P 



^ 2a + {l-a)z ^f 2a ,v.e[.2,.3]. (26) 



2(1 -a) \a{2-z) + z 
and h*{0) = h*{l) = p* satisfies Ts'^^"h*{z) = h* + p* Vz G [z2, Z3]. 

Proof: Using the policy of 7*, 6* as in (|22l),(l23]l, one can write (T5.^^./i*)(z) Vz G [z2, -^3]: 

h*iz) + p* = {Ts,,^,h*){z) 

^/2a+(l-a)z\ ^ ,^ , 2a+(l-a)z_/ \a-2az 
= H ^ ^ + 2a - (1 + a)z + ^ '—h* ' 



2 7 ' ' 2 V2a + (l-a)z 

Note that the argument in the function h* in ( [27l ). which we denote here as /(z) := 2a+li-a)z ' 

[^2,23] for z E [z2)-2^3]- Hence, we can apply ( [27] ) twice. By using simple algebra we obtain 

h*t \ ( ^ >^ ^ ( 2a + {l-a)z \ az-Aa-z ^ 

^(^)=(t^J^( 2 J-"+ 2(l-a) P 

^ 2a + [l-a)z ^f 2a \ ^^^^[^^^^^j, ^28) 



2(1 -a) \a{2-z) + z 



Using the symmetry relation, one can derive h*{z) for z € [zo>2^i]- If we set z = 1 in (1281) and take 
p* = /i*(l) we obtain that p* = This expression is examined later on. 

In order to find the variable a we make sure that our 7* , 5* are indeed the arguments that maximize 
Th*{z). To show this, we differentiate the expression Ty^sh*{z) with respect to 6 and set the result to 0. 
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The function /t*(z) 




•JJ 0.05 



Histogram of Z 



0.2 0.4 0.6 0.8 



Action parameter, 6* 




Action parameter, 7* 



0.2 0.4 0.6 0.8 



Fig. 4. The results using the numerical analysis. On the top-left, the value function h*{z) is shown. On the top-right, the 
assumed relative state frequencies are shown (the figure presents only the states). At the bottom, the optimal action policies S* 
and 7* are illustrated. As can be seen, these match perfectly the results obtained numerically after the 20*'' value iteration. 



Recall that Ty^ is the DP operator restricted to the policy 7, 6, hence it is the DP operator without the 
supremum. Using basic algebra and setting 7*, (5* as in (l22l) . (1231 ) with a as a variable, we obtain 



dTs,^h*{z) ^ _p_ log2(«) 

35 a - 1 2(a - 1) ^ ^ 

with the notation of O-log(O) = 0, since Wmt^Q t-log(t) = 0. Replacing p* with and setting the result 
to zero we find the variable a to be the root of a fourth-degree polynomial, x^ — 5x^ + 6x^ — 4x + l, where, 
using the Ferrari Formula, it can be found explicitly. We find that this polynomial has two imaginary 
roots, one root which is grater then 1 (f« 3.63), and one root in [0, 1], which is roughly 0.4503. Hence, 
the only suitable value is a 0.4503. Calculating p* = we find that p* ^ 0.575522. 
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In Fig. m the results using the numerical analysis are presented. The upper pictures present the function 



h*{z) 



^ 2(l-a) \a(2-z)+2 



I az—ia—z 
^ 2(l-a) ^ 



if r c 



(30) 



as found using the numerical evaluation and the histogram of z, where for z € [0, we derive h*{z) 
using the symmetry of h*{z) with respect to i. The histogram shows all the values which were occupied 
by the state zt for some t; the relative frequency shows how many times a value has been occupied. The 
bottom line presents the actions parameters S*{z) and 7*(^) as obtained numerically. Fig. |5] shows the 
function J20, which was attained after 20 value iterations together with the function h*{z) obtained in 
the numerical analysis. 

The functions h(z) and J^n 




Fig. 5. The function J20 as found numerically and the function h*{z) found using the numerical analysis on the same plot. 
The two results match peri'ectly. 



B. Analytical Solution Verifications 

In this section we verify that the function h*{z), as found in (l30l) . and p* = indeed satisfy the 

Bellman Equation, Th*{z) = h*(z) + p*. Furthermore, we show that the selection of (5*, 7*, as in (|23] |. 
(I22I) maximizes Th*{z). Namely, we prove Theorem [T] 

We begin with proving the following lemma: 

Lemma 4. The policies 7* , 6* which are defined in f l22l) . M3^ : 

\ a + az, if ze [zq.zi] 
7*(z) = I 

[ 1 - z, if ze [zi,z^] 
17 



6*iz) 



z, if Z £ [zo,Z2] 

a{2 - z), if z £ [2:2,23] 

maximize Th*{z), i.e. Ts<-^-y'h*{z) = Th*{z), where a ~ 0.4503 is a specific root of the fourth-degree 
polynomial — 5x^ + — 4x + 1. 

In order to prove Lemma |4] we need two main lemmas, which use the following notation: we denote the 
expression g{6,^) = Ts^^h*{z) (which is Th*{z) without the supremum, restricted to the policy ((^,7)) 
as follows: 



2 2)' 2 \ 5+1-7 

+ ^-^^h* ( '^^^) . (31) 



2 \l + -f-6^ 

The first main lemma shows that the function g{6, 7) is concave in {6, 7). To prove this lemma we first 
show that the concatenation of any finite collection of continuous concave functions, {/j : [aj_i,aj] — > 
M} i = 1,2, ••• ,n, where aj_i < Oj and a-i G M, which each have the same derivative at the 
concatenation points (^f-_{ai) = , is a continuous concave function. It is sufficient to show 

that the concatenation of two continuous, concave functions with the same derivative at the concatenation 
point is continuous and concave. The proof for any finite collection of such functions results using 
induction. 

Lemma 5. Let / : [a, /3] — > R, (7 : [/3,7] —^M. be two continuous, concave functions where f{f3) = g{f3), 
fL{f3) = g'j^ifi), where fLifi) denotes the left derivative of f{x) at /3 and g'^{/3) denotes the right 
derivative of g{x) at /3. The function obtained by concatenating f{x) and g{x) defined by 

(fix), if X £ [a, /31 
(32) 
g{x), //■xG[/3,7] 

is continuous and concave. 

The proof of Lemma |5] is in the appendix. 

We now conclude that the function h*{z) is concave in z. 

Corollary 5. The function h*{z) as given in l\30i is continuous and concave for all z G [0, 1]. 
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Proof: It is well known that the binary entropy function, Hi,{z), is concave in z. Thus, the function 
h*{z) for z G [-^1,-22] is concave. In order to show that h*{z) for z G [^^Oi-^i] and for z £ [z2,zs] is 
concave we first notice that (^j^^ H concave since it is a composition of the binary 

entropy function and a linear, non-decreasing function of z. Second, the expression — 2:+ "^^^^^ p is also 
concave in z since it is linear in z, and third, the expression ( a(2-°)+z ) concave using 

the perspective property of concave functions. Hence, the sum of the three expression is also concave, 
which implies that h*{z) is concave in z for z E [20,21], z E [21,22], and for 2 E [22,23]. It is easy to 
verify that the function h* (2) is a concatenation of three functions that satisfies the conditions in Lemma 
|5] Thus, we conclude that h*{z) is continuous and concave for all 2 E [0, 1]. ■ 
Using the previous corollary we obtain the following: 

Lemma 6. Let h{z) be a concave function. The expression given by 

is concave in ((5, 7). 

The proof of Lemma |6] is given in the Appendix. 
Eventually, we obtain the concavity of the function 7). 

Corollary 6. The function g{S,^) is concave in (5,7). 

Proof: From Corollary |5] we have that h*{z) is a concave function. Using Lemma[6]and the definition 
of g{S,j) we conclude that g{S,"f) is concave in {S,j). ■ 
The second main lemma shows that 6* and 7* are optimal, which means that they maximize the 
function g{5,j). First, we mention the KKT conditions adjusted to our problem: 

Lemma 7 (KKT conditions). Let g(6, 7) be the objective function. We consider the following optimization 
problem: 

maxc/((5, 7) 

0,7 

s.t. 

7-l + 2<0, -7<0, (5-2 <0, -5<0. 

The Lagrangian of g{5, 7) is C{5, 7, A) = g{6, 7) — Ai(7 — 1 + 2) + A27 — ^^{S — 2) + X^d. Since g{6, 7) 
is a concave function then the following conditions are sufficient and necessary for optimality: 
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(1) '-^^ = A3 - A4. 

(2) '-^^ = Ai - A2. 

(3) 7 > 0,(5 > 0. 

(4) 7 - 1 + z < 0, (5 - z < 0. 

(5) Ai,A2,A3,A4 >0. 

(6) Ai(7 - 1 + z) = A27 = A3(<5 -z) = A4<5 = 0. 

The optimality conditions is a conclusion from the KKT conditions and Corollary |6] 

Lemma 8. The following optimality conditions hold: 

(a) If ze[^, I], M^fCl = 0, and ^^^^f^ > then 5*,^* are optimal. 

(b) Ifz£[^, ^^^^ > 0, and > then 5*,-/* are optimal. 

Proof: First, we consider case da) in which z G [1^, 1]: 
if we take A2 = A3 = A4 = 0, Ai = ^^^g^''^ ^ , and 7* = 1 — z, the KKT conditions, which are given in 
Lemma |7] hold since Ai > 0. 

Second, we consider case © in which z G [j^, irp^]: 
if we take A2 = A4 = 0, Ai = A3 = ^^%^^'\ = z, and 7* = 1 - z, the KKT conditions 

hold since Ai > and A3 > 0. ■ 

Using Corollary |6] and Lemma H] we prove Lemma S] 

Proof of Lemma^ From Corollary |6]we have that g{6,^) is concave in (5,7). Thus, the KKT 
conditions are sufficient and necessary. First we assume that z G [^^,1]. We note that the expression 
1 + 1^1^ is in 1]. Furthermore, replacing 7, 6 with 7*, 5* respectively, we find to be 0. We 

differentiate g{S,j) with respect to 6 and evaluate it in (5*, 7*): 

85 a-1 

Using basic algebra we find that the expression equal to zero iff a'^ — + 6a^ — 4a + 1 = 0. 

Thus, setting a ?a 0.4503 to be the unique real root in the interval [0, 1] of the polynomial — 5x^ + 
— 4x + 1 we establish that 

d9i6*,r) ^ 2p*+log,{a) ^ ^^^^ 
86 a — I 
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Now we differentiate g{6, 7) with respect to 7 when a ^ 0.4503. We find that the derivative is strictly 
positive 



> 0. (34) 



Note that the derivative is positive when a < 0.9. This can be seen since ^^^1 ^ is a mono tonic ally 



d'y 

increasing function of a, which for a < 0.9, equal to zero for z < ^q^. Since we have found a to be 
approximately 0.4503 we have that ^^^g^^ > 0. Using Lemma [8] case © we conclude that 6*,j* are 
optimal. The analysis for z e [0, is completely analogous. 

Second, we assume z € [j-pf , ^fa}- ^^^^ case, we have that h*{z) = Hh{z). Using simple algebra 
we obtain that 

99(6*, 1*) ^ Q 



dS 



> 0. (35) 



d-f 

Using Lemma [8] case dbj we conclude that 5*, 7* are optimal. Thus, for all z G [0, 1] we have that 5*, 7* 
are optimal. ■ 

We now prove Theorem [T] In the proof we show that Th*{z) = h*{z) + p*, p* = S,j which 

maximize the operator T are 5*, 7*, and h*{z) is given in (l30l ). This implies that h*{z) solves the Bellman 
Equation. Therefore, p* is the optimal average reward, which is equal to the channel capacity. 

First, we prove Theorem HJa). 

Proof of Theorem \l\a ): We would like to prove that the capacity of the the Ising channel with 
feedback is C/ = (^^) « 0.5755 where a ^ 0.4503 is a specific root of the fourth-degree polynomial 
x'^ — 5x^ + Gx"^ — 4a; + 1. According to Theorem lU if we identify a scalar p and a bounded function 
h{z) such that 

p + h{z) = sup i g{z,u) + / Pw{dw\z,u)h{F{z,u,w)) \ ^z G Z (36) 
ueu \ J ) 

then p = p* . Using Lemma |4] we obtain that 5* and 7*, as defined in (I22l).(l23l). maximize Th*{z) when 

a 0.4503. In addition, we show that h*{z), which is defined in (l30b . satisfies the Bellman Equation, 

Th*(z) = h*{z) + p*, where p* = ^j^- This follows from Lemma|2]and Lemma|3] Therefore, we have 

identified a bounded function, h*{z), and a constant, p*, together with a policy, 7*, 5*, which satisfy 
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the Bellman Equation. Thus, the capacity of the Ising channel with feedback is p* = h*{0) = h*{l) = 
« 0.575522. ■ 
Now we prove Theorem (Hb). The proof is straightforward. 

Proof of Theorem \l\b): We define g{z) = '^^'^^'^ and we calculate g'{z) = ^ ''^(3^^^2 ■ 
81og2(l - z) - 61og2(-z) = iff (1 - zf - z6 = 0. The polynomial (1 - zf - = Q is reducible, 
hence we can write (1 — z)^ — z^ = {\ — Az + — "iz^ + z'^){l — Az + 62;^ — Sz^ + z^). Therefore, 
g'{a) = since a ^ 0.4503 is the root of the polynomial — 5x^ + — 4rc + 1. It is easy to verify 
that g'{a — e) > and g'{a + e) < 0. Together with the fact that a is the only real number in [0, 1] which 
sets g'{z) to zero, a is a maximum point of g{z), for < z < 1. ■ 

VII. Relation of the DP Results and the Coding Scheme 

In this section we analyse the DP results and derive the coding scheme from these results. Especially, 
we use the histogram of z, which is presented in Fig. |3] However, we first recap a few definitions that 
where used in this paper: 

1) zt = p{st = 0[y*), where st = xt- This means that zt is the probability of the input xt being given 
the output. Thus, if zt = 0, then xj = 1 with probability 1. 

2) We defined 5 = zt-iut{l, 1) and 7 = (1 — zt-i)ut{2, 2), where ut{l, 1) = PT{xt = 0|sj_i = 0) and 
nt(2,2) =Pr(xt = l|st_i = 1). 

3) We estabUshed that 

a + az, if ze [po,pi] 
7 (z) = { (37) 
1- z, if z e [pi,P3] 



. z, if z e \po,P2] 

5*{z) = { ^ ^ (38) 

a(2 - z), if z e [P2,P3]- 



4) We also established the equation in (ITTI ): 

f 1 + if u;* = 

zt=l '+^'-^* (39) 

We also remind that in the histogram, which is presented in Fig. |3j zt alternates between four points, 
two of which are and 1. In order to keep in mind that these points stand for probability we denote them 
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as Po,pi,P2,P3, where po = and p^ = I. Using ([TT]) and the definition of 7*, 6* we can derive Table 
rvn The table presents zt+i as a function of zt and yt+i- It also presents the optimal action parameters, 
ut{l, l),ut{2,2), for each state. The action parameters are calculated from the parameters 5*, 7*. 

TABLE VI 

The dp states at time t + 1 as a function of the previous state and the output calculated using i fTTT ). The 

TABLE PRESENTS THE OPTIMAL ACTIONS FOR EACH STATE. 





Zt = po 


Zt = pi 


Zt = P2 


Zt = P3 


yt^o 


Zt+1 = P3 


Zt + 1 = P3 


Zt+1 = P3 


Zt+1 = P2 


yt = 1 


Zt+1 — pi 


Zt + 1 = po 


Zt+1 = po 


Zt+1 = po 


Mt+i(2,2) 


a 


1 


1 


irrelevant 


Wt+l(l, 1) 


irrelevant 


1 


1 


a 



Assume at first that at time t — 1 the state is zt-i = po = 0. 

1) Decoder: Using the definition of zt^i we deduce that p{st-i = 0[y*^^) = and hence xt-i = 1 
with probability 1. Thus, the decoder decodes 1. 

2) Encoder: The optimal actions are 6^{0) = and 7^(0) = a. Using the definition of 7* we conclude 
that Pr(xt = = 1) = a. Thus, Pr(x( = 0|s(_i = 1) = 1 — a, which means that, given that 
st-i = xt-i = 1, the probability to send 1 again is a. This result gives us the alternation probability 
from 1 to 0, which is 1 — a. Since st-i = xt-i = 1 with probability 1, the action parameter 6* is 
irrelevant because it concerns the case in which st-i = 0. Indeed, using the definition of 6*, we can 
see that 5^(0) = 0. 

We now use Table IVTl in order to find the next state. We have two options; if the output is we move 
to state p3 = 1. For this state the analysis is similar to the state pQ, switching between and 1. Note 
that since the next state is ps = 1 the decoder decodes the bit which was sent. If, on the other hand, the 
output is 1 we move to the state zt = Pi- Assuming zt = pi we have the following: 

1) Decoder: Using the definition of zt we deduce that p{st = 0|y*) = pi and hence xt = 1 with 
probability pi. Thus, the decoder does not decode and waits for the next bit. 

2) Encoder: The optimal actions are (^^^^(pi) = pi and Jt+i{pi) = 1 — pi. Using the definition of 
7* we conclude that FT{xt+i = = 1) = 1 and using the definition of 6* we conclude that 
Pv{xt+i = 0\st = 0) = 1. This means that xt+i = st = xt with probability 1. 
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The analysis for state p2 is done in a similar way. 



DP state pq 

De: p{xt = 0|2/*) = 
En: Pr(a;t+i = xt) = a 



Vt+i = 1 



Vt+i = 1 



Vt+i = 1 



yt+1 = 



DP states p\,P2 

De: Waits 
En: xt+i = Xt 



DP state ps 

De: pixt = 0|y*) = 1 
En: Pr(a;t+i = xt) = a 



yt+i = 



yt+i = 



Fig. 6. The coding graph for the capacity-achieving coding scheme at time t. The states, pi i — 0, 1, 2, 3 are the DP states 
where po = 0,P3 = 1. The labels on the arcs represents the output of the channel at time t + 1. The decoder and the encoder 
rules, which are written in vertices of the graph, yield the coding scheme presented in Theorem |2] 



We can now create a coding graph for the capacity-achieving coding scheme. Decoding only when the 
states are po or p-^ results in a zero-error decoding. The coding graph is presented in Fig. [6] In the figure 
we have three vertices, which corresponds to the DP states. At each vertex we mention the corresponding 
state or states, the decoder action, and the encoder action. The edges' lables are the output of the channel. 
The edges from vertices po to and vice versa corresponds to case (1.1) in Theorem |2] in the encoder 
scheme and to case (1.1) in Theorem |2]in the decoder scheme. The edges between vertices po and pi,p2 
and between p-^ and pi,p2 correspond to case (1.2) in Theorem |2] in the encoder scheme and to case 
(1.2) in Theorem |2] in the decoder scheme. 

VIII. Capacity-Achieving Coding Scheme Analysis 

In this section we show that the coding scheme presented in Theorem |2] and in the previous section 
indeed achieves the capacity. In order to analyze the coding scheme, we would like to present the same 
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coding sclieme in a different way. One can see that tlie sclieme that presented in Theorem |2] resembles to 
the zero-error coding scheme for the Ising channel without feedback, which has been found by Berger and 
Bonomi lH. The zero-error capacity-achieving coding scheme for the Ising channel without feedback is 
simple: the encoder sends each bit twice and the decoder considers every other bit. The channel topology 
ensures that every second bit is correct with probability 1. Moreover, it is clear that this scheme achieves 
the zero-error capacity of the Ising channel without feedback, which is 0.5 bit per channel use. Thus, 
one can see the intuition behind the coding scheme suggested in Theorem |2] 

The following proof shows that the coding scheme presented in Theorem |2] indeed achieves the capacity. 
In the proof we calculate the expected length of strings in the channel input and divide it by the expected 
length of strings in the channel output. 

Proof of Theorem^ Let us consider an encoder that contains two blocks, as in Fig. |7l The first 
block is a data encoder The data encoder receives a message M" {M = {0, 1}) of length n distributed 
i.i.d. Bernoulli (i) and transfers it to a string of data with probability of alternation between 1 and 
and vice versa of q. This means that if some bit is (alternatively 1), the next bit is 1 (alternatively 0) 
with probabiUty q. In order to create a one-to-one correspondence between the messages and the data 
strings we need the data strings to be longer than n. 

Let us calculate the length of the data strings needed in order to transform the message into a string 
with alternation probability of q. We notice that p{xt\x^^^) = p{xt\xt-i). Thus, the entropy rate is 

n— >-oo n n—>-co n ^ — ' 

1=1 

= H{X,\Xi^{)^^ H{q) (40) 

where 

• (a) is due to the chain rule and since = p{xt\xt^i). 

• (6) is due to the fact that the probability of alternation is q. 

Therefore, given a message of length n, the data encoder transfers it into a data string of length 77^^- 
This can be done using the method of types for the binary sequence. Given a probability q and a binary 
sequences of length n, the size of the typical set is about 2^^''^'^\ Hence, we can map the set of Bernoulli 
i sequences of length n (which is of size 2") to the set of sequences of length jj^^ with alternation 
probability q (which is of size 2"i>(i)^''^'^^ = 2"). One can also use the mapping presented in Il26l . This 
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Data 
Encoder 


dt 


Channel 
Encoder 


1 Xt 


1 > 


> 


j > 



yt-1 



Fig. 7. The channel encoder block which consists of two sub-encoders. One block encodes data and the other performs the 
channel encoding. 



mapping gives a simple way to enumerate the indexes of Markov sequences of length ^^'^ 
binary sequences of length n, which distributed Bernoulli (i). One can enumerate these sequences and 
establish a mapping from the Bernoulli sequences to the Markov sequences simply by matching their 
indexes. 

The second block is the channel encoder. This encoder receives a data string in which the probability 
of alternation between to 1 and vice versa is q. This sequence passes through the encoder, which 
sends some bits once and some bits twice. Due to that property, the transmitted bit at time t is not 
necessarily the data bit at the tth location. This is why the encoder scheme uses two time indexes, t and 
t', which denote the data bit location and the current transmission time, respectively. The encoder works 
as mentioned in Theorem |2l 

(1) Encoder: At time t', the encoder knows Sf^i = Xf-i; it is clear from the encoder description that 
Xf-i = rrit-i and we send the bit mt {xf = mt): 

26 



(1.1) If yt' 7^ Sf'_i then move to the next bit, mt+i- This means that we send mt once. The probability 
that yt> / Sf-i is the probability that Xf / Xf-i and yt> = Xf, namely q ■ 

(1.2) If yt' = Sf-i then Xf = Xf+i = nit, which means that the encoder sends mt twice (at time 
t' and t' + 1) and then move to the next bit. The probability that y^ = Sf-i is the probability 
that Xf = Xf-i or Xf ^ Xf-i and y^ / Xf, namely (1 — + = 

Now we calculate the expected length of the channel encoder output string. First, the message is of 
length n and distributed Bernoulli i. Thus, the length of the string which has alternation probabiUty of 
q is jj]^- Then, with probability we send two bits and with probability | we send one bit. Hence, 
the expected length of the channel encoder output string is 77^^ (^^? + 2 ) ~ H^{q) ' 

We send 2" messages in i/^i \ transmissions, hence the rate is 4_," „ = Setting g = 1 — a 

we achieve the rate ^^3^^°^ = ^|+^- This is true for any a € [0, 1], in particular it holds for the unique 
positive root in [0, 1] of the polynomial — + 6a;^ — 4x + 1. Using Theorem [T] the expression 
is equal to the capacity of the Ising channel with feedback. This means that the scheme achieves the 
capacity. ■ 

The following illustration shows the channel encoding and decoding scheme. Assume we are to send 
the data string 0110 and the state at time f = is 0. Following the channel encoder rules, we present the 
encoder output, Xf, in Table IVIII The table shows in each time, t' = 1, 2, 3, • • • , the channel input Xf 
and output yf. The right-most column of the table refers to the data string bit that is currently encoded 
(denoted with the time index t in the decoder scheme), from first to fourth. In this example the input to 
the channel is 0011100 and the output of the channel is 0011110. Note that in the decoding process on 
the 3rd step, the channel state is and the channel input is 1; since the channel output was also 1 this 
bit was sent only once. Now we follow the decoder rules in order to decode the received word 0011110. 
We remind that the decoder rules are as follows: 

(1) Decoder: At time t' , assume the state sj'-i is known at the decoder and we are to decode the bit 
mt. 

(1.1) If yr / st'-i then rht = yr and Sf = yv- 

(1.2) If yt. = sj'_i then wait for y^+i, mt = Vt'+i and Sf = yr+i- 

Table IVIIll presents the decoder decisions made in each time t. The second column represents the channel 
output (the decoder input), yt. In the third column, the channel state, st, from the decoder point of view 
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TABLE VII 

Example for encoding the word 0110 where we assume channel output using the channel topology. The 
"encoded bit" represents the bit we currently encode (1, 2, 3, 4). 



time 


channel 


channel 


channel 


encoded 


case 


t' 


state 


input 


output 


bit 






St' = Xt'-l 


Xt' 


Vt' 


t 




1 








(w.p 1) 


1 


1.2 










(w.p Ij 


1 


1.2 


3 





1 


1 (w.p i) 


2 


1.1 


4 


1 


1 


1 (w.p 1) 


3 


1.2 


5 


1 


1 


1 (wp 1) 


3 


1.2 


6 


1 





1 (wp i)) 


4 


1.2 


7 








(w.p 1) 


4 


1.2 



is presented. The question mark stands for an unknown cliannel state. In this situation, the decoder cannot 
decode the bit that was sent and it has to wait for the next bit. The action column records, in each time, 
the action made by the decoder, which can decode or take no action and wait for the next bit to arrive. 
In the last column the decoded string is presented. Following the decoder rules, we have decoded the 

TABLE VIII 

Example for decoding the word 0011110 where we use the decoding rules. The channel state is given 
from the decoder's point of view. since the decoder decodes only when the state is known with 
probability 1, we denote bits we cannot yet decode by a question mark. 



time 


channel 


channel 


action 


decoded 


case 


t' 


output yt' 


state St' 




word 




1 





7 


none 


7 


1.2 


2 








decode 





1.2 


3 


1 


1 


decode 


01 


1.1 


4 


1 


7 


none 


01? 


1.2 


5 


1 


1 


decode 


Oil 


1.2 


6 


1 


7 


none 


oil? 


1.2 


7 








decode 


0110 


1.2 



correct word 0110. As we can see, using this coding scheme we can decode the word instantaneously 
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with no errors. 

An interesting fact is that in order to achieve the capacity using this coding scheme, we do not need to 
use the feedback continuously. It is enough to use the feedback only when there is an alternation between 
to 1 (or vice versa) in the bits we send. When there is no alternation, the feedback is not needed since 
the bit is sent twice regardless of the channel output. Several cases of partial feedback use are studied 
in 1271. 



We have derived the capacity of the Ising channel, analyzed it and presented a simple capacity-achieving 
coding scheme. As an immediate result of this work we can tighten the upper bound for the capacity 
of the one-dimensional Ising Channel to be 0.575522, since the capacity of a channel without feedback 
cannot exceed the capacity of the same channel with feedback. 

A DP method is used in order to find the capacity of the Ising channel with feedback. In the case 
presented in this paper, we have also established a connection between the DP results and the capacity- 
achieving coding scheme. An interesting question that arises is whether there exists a general method for 
finding the capacity for two states channels with feedback, whose states are a function of the previous 
state, the input, and the previous output. It may be the case that the solution of the DP for such a channel 
has a fixed pattern. Towords this goal, a new coding scheme is provided in |[28]| for unifilar finite state 
channels that is based on posterior matching. 



Proof of Lemma\5} the function r]{x) is continuous by definition, since / and g are continuous and 
f{(3) = g{fi). We continue the function f{x) on [/3,7] with a straight line with incline f'_{(3) and the 
function g{x) with a straight line with incline g'_^{P) as in Fig. [8] We define 



IX. Conclusions 



Appendix 




if X G [a, /3] 
if X G [/?, 7] 



(41) 



and 



< 



(x-/3)5V(/3) + 5(/3) 



if X G [a, /3] 
if X G [/3,7]. 



(42) 
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Fig. 8. Examples for concatenation of two continuous, concave functions. It is easy to see from the figures the intuition behind 
Lemma ([5j . Function 1 is not concave due to the fact that < Function 2 is concave since f'-{l3) = g'^{l3). 



The functions fi{x) and gi{x) are concave since we continue the functions with a straight line. Since 
f{x) and g{x) are concave we have f{x) < {x — I3)f'_{f3) + /(/3) for all x € [a,/?] and g{x) < 
{x — f3)g'j^{l3) + g{l3) for all x G [/5, 7]. Hence, for all x € [«,/?] we have gi{x) > f{x) and for all 
X € [/3,7] we have fi{x) > g{x). Therefore, r/(x) = vain{fi{x), gi{x)} and, since the minimum of two 
concave functions is a concave function |[29l . 'q{x) is concave. ■ 
Proof of Lemma ^ We now show that the expression 

H[-^ + (5 + 7 - IH -h 1 + H -h 

is concave in {6, 7). 

To show this we use the fact that h{z) is a concave function in z. Note that, since the binary entropy 
is concave, the expression H (^^ + ^T^) + 5 + 7 — 1 is concave in (5, 7). We examine the expression 
^~^l~'^ h (^1 + siil^y ) ■ Let us denote r/j = l±^p2i ^ i = 1, 2. For every a G [0, 1] we obtain that 

ar/i + (1 - a)r?2 V Vi J + (1 - a)^2 V ^2 / V ar/i + (1 - a)r?2 / 

where (i) is due to the fact that h{z) is concave. This result implies that 

amh 1 + + 1 - a)r]2h 1 + ^ < {am + 1 - a r/2 /i 1 + \ , • 

Hence, (^1 + 3+1-7 ) concave in (5,7). It is completely analogous to show that the expression 
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^—p-^h ^1 + j is also concave in ((5,7). Thus, we derive that the expression 

H { - -\ + 5 + 7 - IH -h 1 + H -h '- 

is concave in ((5,7). ■ 
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